Hitting the Right Paraphrases in Good Time
نویسندگان
چکیده
We present a random-walk-based approach to learning paraphrases from bilingual parallel corpora. The corpora are represented as a graph in which a node corresponds to a phrase, and an edge exists between two nodes if their corresponding phrases are aligned in a phrase table. We sample random walks to compute the average number of steps it takes to reach a ranking of paraphrases with better ones being “closer” to a phrase of interest. This approach allows “feature” nodes that represent domain knowledge to be built into the graph, and incorporates truncation techniques to prevent the graph from growing too large for efficiency. Current approaches, by contrast, implicitly presuppose the graph to be bipartite, are limited to finding paraphrases that are of length two away from a phrase, and do not generally permit easy incorporation of domain knowledge. Manual evaluation of generated output shows that our approach outperforms the state-of-the-art system of Callison-Burch (2008).
منابع مشابه
Efficient Computation of Mean Truncated Hitting Times on Very Large Graphs
Previous work has shown the effectiveness of random walk hitting times as a measure of dissimilarity in a variety of graph-based learning problems such as collaborative filtering, query suggestion or finding paraphrases. However, application of hitting times has been limited to small datasets because of computational restrictions. This paper develops a new approximation algorithm with which hit...
متن کاملAid Effectiveness in the Sustainable Development Goals Era; Comment on ““It’s About the Idea Hitting the Bull’s Eye”: How Aid Effectiveness Can Catalyse the Scale-up of Health Innovations”
Over just a six-year period from 2005-2011, five aid effectiveness initiatives were launched: the Paris Declaration on Aid Effectiveness (2005), the International Health Partnership plus (2007), the Accra Agenda for Action (2008), the Busan Partnership for Effective Cooperation (2011), and the Global Partnership for Effective Development Cooperation (GPEDC) (2011). More recently, in 2015, the A...
متن کاملExternal Plagiarism Detection based on Human Behaviors in Producing Paraphrases of Sentences in English and Persian Languages
With the advent of the internet and easy access to digital libraries, plagiarism has become a major issue. Applying search engines is one of the plagiarism detection techniques that converts plagiarism patterns to search queries. Generating suitable queries is the heart of this technique and existing methods suffer from lack of producing accurate queries, Precision and Speed of retrieved result...
متن کاملUsing Paraphrases of Deep Semantic Representions to Support Regression Testing in Spoken Dialogue Systems
Rule-based spoken dialogue systems require a good regression testing framework if they are to be maintainable. We argue that there is a tension between two extreme positions when constructing the database of test examples. On the one hand, if the examples consist of input/output tuples representing many levels of internal processing, they are finegrained enough to catch most processing errors, ...
متن کاملAligning Predicate-Argument Structures for Paraphrase Fragment Extraction
Paraphrases and paraphrasing algorithms have been found of great importance in various natural language processing tasks. While most paraphrase extraction approaches extract equivalent sentences, sentences are an inconvenient unit for further processing, because they are too specific, and often not exact paraphrases. Paraphrase fragment extraction is a technique that post-processes sentential p...
متن کامل